Goto

Collaborating Authors

 classification function


GraphStochasticNeuralNetworksfor Semi-supervisedLearning

Neural Information Processing Systems

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of the semi-supervised node classification. However,most existing models learn a deterministic classification function, which lack sufficient flexibility to explore better choices in the presence of kinds of imperfect observed data such as the scarce labeled nodes and noisy graph structure.



Graph Stochastic Neural Networks for Semi-supervised Learning

Neural Information Processing Systems

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of the semi-supervised node classification. However, most existing models learn a deterministic classification function, which lack sufficient flexibility to explore better choices in the presence of kinds of imperfect observed data such as the scarce labeled nodes and noisy graph structure. To improve the rigidness and inflexibility of deterministic classification functions, this paper proposes a novel framework named Graph Stochastic Neural Networks (GSNN), which aims to model the uncertainty of the classification function by simultaneously learning a family of functions, i.e., a stochastic function. Specifically, we introduce a learnable graph neural network coupled with a high-dimensional latent variable to model the distribution of the classification function, and further adopt the amortised variational inference to approximate the intractable joint posterior for missing labels and the latent variable. By maximizing the lower-bound of the likelihood for observed node labels, the instantiated models can be trained in an end-to-end manner effectively. Extensive experiments on three real-world datasets show that GSNN achieves substantial performance gain in different scenarios compared with stat-of-the-art baselines.


Graph Stochastic Neural Networks for Semi-supervised Learning

Neural Information Processing Systems

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of the semi-supervised node classification. However, most existing models learn a deterministic classification function, which lack sufficient flexibility to explore better choices in the presence of kinds of imperfect observed data such as the scarce labeled nodes and noisy graph structure.


e586a4f55fb43a540c2e9dab45e00f53-AuthorFeedback.pdf

Neural Information Processing Systems

We will involve your suggestions in the revised version. T o Reviewer #1: Thank you for the comments. Y ou have some misunderstandings on this paper. C1: It is similar with (conditional) graph generation. C2: It is similar to RGCN.


Graph Stochastic Neural Networks for Semi-supervised Learning

Neural Information Processing Systems

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of the semi-supervised node classification. However, most existing models learn a deterministic classification function, which lack sufficient flexibility to explore better choices in the presence of kinds of imperfect observed data such as the scarce labeled nodes and noisy graph structure. To improve the rigidness and inflexibility of deterministic classification functions, this paper proposes a novel framework named Graph Stochastic Neural Networks (GSNN), which aims to model the uncertainty of the classification function by simultaneously learning a family of functions, i.e., a stochastic function. Specifically, we introduce a learnable graph neural network coupled with a high-dimensional latent variable to model the distribution of the classification function, and further adopt the amortised variational inference to approximate the intractable joint posterior for missing labels and the latent variable. By maximizing the lower-bound of the likelihood for observed node labels, the instantiated models can be trained in an end-to-end manner effectively. Extensive experiments on three real-world datasets show that GSNN achieves substantial performance gain in different scenarios compared with stat-of-the-art baselines.


Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

Fort, Stanislav, Lakshminarayanan, Balaji

arXiv.org Artificial Intelligence

Adversarial examples pose a significant challenge to the robustness, reliability and alignment of deep neural networks. We propose a novel, easy-to-use approach to achieving high-quality representations that lead to adversarial robustness through the use of multi-resolution input representations and dynamic self-ensembling of intermediate layer predictions. We demonstrate that intermediate layer predictions exhibit inherent robustness to adversarial attacks crafted to fool the full classifier, and propose a robust aggregation mechanism based on Vickrey auction that we call \textit{CrossMax} to dynamically ensemble them. By combining multi-resolution inputs and robust ensembling, we achieve significant adversarial robustness on CIFAR-10 and CIFAR-100 datasets without any adversarial training or extra data, reaching an adversarial accuracy of $\approx$72% (CIFAR-10) and $\approx$48% (CIFAR-100) on the RobustBench AutoAttack suite ($L_\infty=8/255)$ with a finetuned ImageNet-pretrained ResNet152. This represents a result comparable with the top three models on CIFAR-10 and a +5 % gain compared to the best current dedicated approach on CIFAR-100. Adding simple adversarial training on top, we get $\approx$78% on CIFAR-10 and $\approx$51% on CIFAR-100, improving SOTA by 5 % and 9 % respectively and seeing greater gains on the harder dataset. We validate our approach through extensive experiments and provide insights into the interplay between adversarial robustness, and the hierarchical nature of deep representations. We show that simple gradient-based attacks against our model lead to human-interpretable images of the target classes as well as interpretable image changes. As a byproduct, using our multi-resolution prior, we turn pre-trained classifiers and CLIP models into controllable image generators and develop successful transferable attacks on large vision language models.


Generalization bounds for regression and classification on adaptive covering input domains

Hwang, Wen-Liang

arXiv.org Machine Learning

Our main focus is on the generalization bound, which serves as an upper limit for the generalization error. Our analysis delves into regression and classification tasks separately to ensure a thorough examination. We assume the target function is real-valued and Lipschitz continuous for regression tasks. We use the 2-norm and a root-mean-square-error (RMSE) variant to measure the disparities between predictions and actual values. In the case of classification tasks, we treat the target function as a one-hot classifier, representing a piece-wise constant function, and employ 0/1 loss for error measurement. Our analysis underscores the differing sample complexity required to achieve a concentration inequality of generalization bounds, highlighting the variation in learning efficiency for regression and classification tasks. Furthermore, we demonstrate that the generalization bounds for regression and classification functions are inversely proportional to a polynomial of the number of parameters in a network, with the degree depending on the hypothesis class and the network architecture. These findings emphasize the advantages of over-parameterized networks and elucidate the conditions for benign overfitting in such systems.


Do stable neural networks exist for classification problems? -- A new view on stability in AI

Liu, Z. N. D., Hansen, A. C.

arXiv.org Artificial Intelligence

In deep learning (DL) the instability phenomenon is widespread and well documented, most commonly using the classical measure of stability, the Lipschitz constant. While a small Lipchitz constant is traditionally viewed as guarantying stability, it does not capture the instability phenomenon in DL for classification well. The reason is that a classification function -- which is the target function to be approximated -- is necessarily discontinuous, thus having an 'infinite' Lipchitz constant. As a result, the classical approach will deem every classification function unstable, yet basic classification functions a la 'is there a cat in the image?' will typically be locally very 'flat' -- and thus locally stable -- except at the decision boundary. The lack of an appropriate measure of stability hinders a rigorous theory for stability in DL, and consequently, there are no proper approximation theoretic results that can guarantee the existence of stable networks for classification functions. In this paper we introduce a novel stability measure $\mathscr{S}(f)$, for any classification function $f$, appropriate to study the stability of discontinuous functions and their approximations. We further prove two approximation theorems: First, for any $\epsilon > 0$ and any classification function $f$ on a \emph{compact set}, there is a neural network (NN) $\psi$, such that $\psi - f \neq 0$ only on a set of measure $< \epsilon$, moreover, $\mathscr{S}(\psi) \geq \mathscr{S}(f) - \epsilon$ (as accurate and stable as $f$ up to $\epsilon$). Second, for any classification function $f$ and $\epsilon > 0$, there exists a NN $\psi$ such that $\psi = f$ on the set of points that are at least $\epsilon$ away from the decision boundary.


Model Stealing Attack against Multi-Exit Networks

Pan, Li, Peizhuo, Lv, Kai, Chen, Yuling, Cai, Fan, Xiang, Shengzhi, Zhang

arXiv.org Artificial Intelligence

Compared to traditional neural networks with a single exit, a multi-exit network has multiple exits that allow for early output from intermediate layers of the model, thus bringing significant improvement in computational efficiency while maintaining similar recognition accuracy. When attempting to steal such valuable models using traditional model stealing attacks, we found that conventional methods can only steal the model's classification function while failing to capture its output strategy. This results in a significant decrease in computational efficiency for the stolen substitute model, thereby losing the advantages of multi-exit networks. In this paper, we propose the first model stealing attack to extract both the model function and output strategy. We employ bayesian changepoint detection to analyze the target model's output strategy and use performance loss and strategy loss to guide the training of the substitute model. Furthermore, we designed a novel output strategy search algorithm that can find the optimal output strategy to maximize the consistency between the victim model and the substitute model's outputs. Through experiments on multiple mainstream multi-exit networks and benchmark datasets, we thoroughly demonstrates the effectiveness of our method.